United States Public Elementary-Secondary Education spending (2016)

by Z. McLaughlin

Introduction

There’s been a lot of media attention to teacher salaries. This exploration covers various elements of how money is spent on education. One area of particular interest is spending in Oklahoma which has gotten a lot of attention in the news recently. Exploration covers overall trends in spending per student across the country, but then focuses is in Oklahoma vs California vs New Jersey to see how things compare.

Example youtube video of teachers in Oklahoma changing jobs.

Dataset

There is a government census done of all public schools in the United States. This data for 2016 is published here:

2016 Public Elementary-Secondary Education Finance Data

Since none of the provided tables contained all the desired information created a cleaned dataset using the following information:

Tables attached to project assignment:

Cleaned data created has the following column headers:

After cleaning data, decided to only include school districts with at least 50 students enrolled.

Added a few more columns later in the exploration:

Note: Teacher compensation = Salary + benefits

Summary of cleaned data

##      ENROLL            TOTALREV           TFEDREV       
##  Min.   :    50.0   Min.   :     234   Min.   :      0  
##  1st Qu.:   452.5   1st Qu.:    6388   1st Qu.:    347  
##  Median :  1163.0   Median :   16328   Median :    920  
##  Mean   :  3724.4   Mean   :   51223   Mean   :   3961  
##  3rd Qu.:  3059.5   3rd Qu.:   42748   3rd Qu.:   2590  
##  Max.   :981667.0   Max.   :27448356   Max.   :1739101  
##      TSTREV            TLOCREV            TOTALEXP       
##  Min.   :       0   Min.   :       0   Min.   :     301  
##  1st Qu.:    2884   1st Qu.:    2206   1st Qu.:    6245  
##  Median :    7775   Median :    6105   Median :   15962  
##  Mean   :   24103   Mean   :   23159   Mean   :   50922  
##  3rd Qu.:   18936   3rd Qu.:   18391   3rd Qu.:   42114  
##  Max.   :10568010   Max.   :15141245   Max.   :29620098  
##     TCURISAL           TCURIBEN          PPCSTOT          PPSALWG      
##  Min.   :       0   Min.   :      0   Min.   :     0   Min.   :     0  
##  1st Qu.:    2030   1st Qu.:    692   1st Qu.:  9455   1st Qu.:  5433  
##  Median :    5133   Median :   2058   Median : 11035   Median :  6347  
##  Mean   :   16947   Mean   :   6928   Mean   : 12796   Mean   :  7237  
##  3rd Qu.:   13947   3rd Qu.:   5795   3rd Qu.: 14345   3rd Qu.:  8031  
##  Max.   :10044302   Max.   :6258743   Max.   :374873   Max.   :183063  
##     PPEMPBEN        PPITOTAL         PPISALWG         PPIEMBEN    
##  Min.   :    0   Min.   : -1472   Min.   :     0   Min.   :    0  
##  1st Qu.: 1827   1st Qu.:  5637   1st Qu.:  3641   1st Qu.: 1200  
##  Median : 2552   Median :  6566   Median :  4269   Median : 1669  
##  Mean   : 2981   Mean   :  7606   Mean   :  4861   Mean   : 1995  
##  3rd Qu.: 3633   3rd Qu.:  8554   3rd Qu.:  5426   3rd Qu.: 2490  
##  Max.   :76688   Max.   :228102   Max.   :131903   Max.   :55750

The data varies a lot:

Univariate Plots Section

Histogram exploration

Investigating how large the school districts are (note each district may contain a different number of separate schools.) Also investigating what the per student spending looks like. Median is marked with red dashed line.

Bivariate Plots

GGPAIRS matrix - Getting some basic correlation data

Plots of enrollment vs expenses

The relationship between enrollment and and total expenses looks very linear at the lower numbers, but as the enrollment goes up, the relationship scatters. Also even though the relationship is fairly linear there is quite a bit of variation and some outliers.

Note that at this point decided to look mainly at districts with less than 125,000 vs districts like NYC that has almost a million students or Hawaii where the whole state is one district.

Plots of enrollment vs total per student sending

Horizontal line on plots above is the median. It appears that the spending per student is at the lower enrollments hovers around the median, but there are quite a few schools where the spending is substantially higher. Starting to bring up questions:

Plots of enrollment vs revenue sources (Federal, State, Local)

Added a red dashed line for enrollment = 10,000 students to make it easier to compare the various sources of funding.

Looks like most money comes from Where the money comes from looks interesting. While state funding looks pretty linear. Looks like most money for education comes from state and local sources.

Bivariate Plots Section 2

Based on the plots above there is a need to create additional data to get more specific plots.

##      PPTISB           PPPOI          PLOCALREV     
##  Min.   :     0   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:  5004   1st Qu.:0.5002   1st Qu.:0.2696  
##  Median :  5882   Median :0.5420   Median :0.3997  
##  Mean   :  6856   Mean   :0.5371   Mean   :0.4307  
##  3rd Qu.:  7757   3rd Qu.:0.5803   3rd Qu.:0.5817  
##  Max.   :187653   Max.   :2.1406   Max.   :0.9904  
##                   NA's   :2

The summary showed percentages over 1 for percent of total spent per pupil for salaries, wages benefits. Did some spot checks and found that there were only a handful of schools impacted. Brings up questions as to the reliability of the data, but since this is not for professional use, will not investigate further.

Plots Looking at percentage of local rev vs other variables

Had expected to maybe be able to see a trend, but did not see anything useful. Decided to look at a subset of data with a limited number of states.

Multivariate Plots Section

Plots Looking at data for a subset of states

Starting to see some noticeable differences:

Plot Looking at just 1 state (New York) a little deeper

It does appear that total per student spending goes up when the percent of funding from local revenue for the school district goes up.

Narrowing down to instructional salary and benefits

Looking at a few more factors.

Narrowing down to just 3 states, NJ, CA, & OK

CA is way below NJ and not too far above OK although it’s pretty well known that it’s much more expensive to live in California. So what happens if a cost of living index is applied to the values?

Factoring in the cost of living for NJ, CA, & OK

Grabbed some data from 2017 (close enough) from: US Learning

To normalize the data will take the total spent on salaries and benefits per pupil divided by the index/100 just to get a more dollar based amount.

Based on the graphs above accounting for cost of living, it seems Oklahoma teachers get paid more than either New jersey or California.

Factoring in the class size for NJ, CA, & OK

In theory class size would have impact the actual salaries. If compensation is the per student money spent on instructional salaries and benefits than:

If the per student instructional compensation were the same.

(Note: This doesn’t examine quality of education based or difficulty of the teaching role as class size increases.)

From:

National Center for Education Statistics

This information is from 2012 so things may be a little old. Four years is a long time. But unless the states increased or decreased disproportionately to other states the results should be similar.

Decide to use the numbers provided in the following way:

(Average class size by level of instruction Elementary + Average class size by level of instruction Secondary)
divided by 2. Then normalize that using the average US class size

Results: Once you add in cost of living and class size there while there is a lot of variation across states, there doesn’t appear to be a big difference between California, New Jersey and Oklahoma in terms of Instructional Salaries and benefits.

Final Plots and Summary

Plot One

Description One

Using the cleaned up data from:

2016 Public Elementary-Secondary Education Finance Data

to graph the per student spend on instructional salary and benefits for California, New Jersey and Oklahoma there is a clear indication that the per student spending in Oklahoma is far less than California or New Jersey for school districts between 1000 and 10,000 students.

It also appears that the variation of what is spent per student is far greater in California and New Jersey than in Oklahoma.

Using this graph alone:

  • Oklahoma teachers are justified in how upset they are regarding salary and benefits provided.

Plot Two

Description Two

Grabbed some data from 2017 (close enough) from: US Learning

Normalized for cost of living:

  • CA - 136.3
  • NJ - 121.2
  • OK - 89.1

The schools switch places. California falls to the bottom. New Jersey ends up in the middle and it appears that Oklahoma teachers are the best paid teachers. Also the variation in spending per student on instructional salaries and benefits for Oklahoma appears larger once the cost of living index is applied since a dollar in Oklahoma goes farther than it does in California or New Jersey.

Plot Three

Description Three

Because the data on plot 1 and plot 2 is NOT strictly reflective of how much a particular teacher gets paid in any of the states it is necessary to use a normalizer to try to compute what the relationships might look like.

Using class size information:

From:
National Center for Education Statistics

This information is from 2012 so things may be a little old. Four years is a long time

Decide to use the numbers provided in the following way:

Average class size for teachers in self-contained classes + Average class size for teachers in departmentalized instruction and divide by 2. Then normalize that using the average US class size.

  • CA - (25.0 + 32.0)/2
  • NJ - (18.5 + 23.9)/2
  • OK - (20.7 + 23.7)/2
  • US - (21.3 + 26.8)/2

and normalizing the data in plot 2 created the plot above.

Now the data from all three states starts to overlap indicating that maybe there is less of a difference on what is spent on teacher salary and benefits by state than what the original data showed alone.

Note that the final numbers on the y axis are not real numbers for any of the states, but an indication of spending relative to each other.


Reflection

What were some of the struggles?

What went well?

What was surprising?

What further investigations could be done?